{
    "nbformat": 4,
    "nbformat_minor": 0,
    "metadata": {
        "colab": {
            "name": "tutorial_LunarLanderContinuous-v2.ipynb",
            "provenance": [],
            "collapsed_sections": [],
            "include_colab_link": true
        },
        "kernelspec": {
            "name": "python3",
            "display_name": "Python 3"
        },
        "accelerator": "GPU"
    },
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "view-in-github",
                "colab_type": "text"
            },
            "source": [
                "<a href=\"https://colab.research.google.com/github/AI4Finance-Foundation/ElegantRL/blob/master/tutorial_LunarLanderContinuous_v2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "c1gUG3OCJ5GS"
            },
            "source": [
                "# **LunarLanderContinuous-v2 Example in ElegantRL**\n",
                "\n",
                "\n",
                "\n",
                "\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "FGXyBBvL0dR2"
            },
            "source": [
                "# **Task Description**\n",
                "\n",
                "[LunarLanderContinuous-v2](https://gym.openai.com/envs/LunarLanderContinuous-v2) is a robotic control task. The goal is to get a Lander to rest on the landing pad. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "DbamGVHC3AeW"
            },
            "source": [
                "# **Part 1: Install ElegantRL**"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "U35bhkUqOqbS",
                "colab": {
                    "base_uri": "https://localhost:8080/"
                },
                "outputId": "0acfcd5b-ebfe-4dba-dd65-10f5e9a3a58e"
            },
            "source": [
                "# install elegantrl library\n",
                "!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git"
            ],
            "execution_count": 1,
            "outputs": [
                {
                    "output_type": "stream",
                    "name": "stdout",
                    "text": [
                        "Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git\n",
                        "  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-q0f_9pry\n",
                        "  Running command git clone -q https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-q0f_9pry\n",
                        "Requirement already satisfied: gym in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.3) (0.17.3)\n",
                        "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.3) (3.2.2)\n",
                        "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.3) (1.19.5)\n",
                        "Collecting pybullet\n",
                        "  Downloading pybullet-3.2.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (90.8 MB)\n",
                        "\u001b[K     |████████████████████████████████| 90.8 MB 291 bytes/s \n",
                        "\u001b[?25hRequirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.3) (1.10.0+cu111)\n",
                        "Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.3) (4.1.2.30)\n",
                        "Collecting box2d-py\n",
                        "  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)\n",
                        "\u001b[K     |████████████████████████████████| 448 kB 70.5 MB/s \n",
                        "\u001b[?25hRequirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.3) (1.4.1)\n",
                        "Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.3) (1.5.0)\n",
                        "Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.3) (1.3.0)\n",
                        "Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from pyglet<=1.5.0,>=1.4.0->gym->elegantrl==0.3.3) (0.16.0)\n",
                        "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.3) (0.11.0)\n",
                        "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.3) (3.0.6)\n",
                        "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.3) (1.3.2)\n",
                        "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.3) (2.8.2)\n",
                        "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->elegantrl==0.3.3) (1.15.0)\n",
                        "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->elegantrl==0.3.3) (3.10.0.2)\n",
                        "Building wheels for collected packages: elegantrl\n",
                        "  Building wheel for elegantrl (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
                        "  Created wheel for elegantrl: filename=elegantrl-0.3.3-py3-none-any.whl size=183567 sha256=a2b2116b1f175b6cad721c1dfeba30620a45100aa1a7e65fe9c19df809fe68d3\n",
                        "  Stored in directory: /tmp/pip-ephem-wheel-cache-ltz2mxds/wheels/52/9a/b3/08c8a0b5be22a65da0132538c05e7e961b1253c90d6845e0c6\n",
                        "Successfully built elegantrl\n",
                        "Installing collected packages: pybullet, box2d-py, elegantrl\n",
                        "Successfully installed box2d-py-2.3.8 elegantrl-0.3.3 pybullet-3.2.1\n"
                    ]
                }
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "UVdmpnK_3Zcn"
            },
            "source": [
                "# **Part 2: Import Packages**\n",
                "\n",
                "\n",
                "*   **elegantrl**\n",
                "*   **OpenAI Gym**: a toolkit for developing and comparing reinforcement learning algorithms.\n",
                "\n"
            ]
        },
        {
            "cell_type": "code",
            "source": [
                "import gym\n",
                "from elegantrl.agents import AgentModSAC\n",
                "from elegantrl.train.config import get_gym_env_args, Arguments\n",
                "from elegantrl.train.run import *\n",
                "\n",
                "gym.logger.set_level(40)  # Block warning"
            ],
            "metadata": {
                "id": "AAPdjovQrTpE"
            },
            "execution_count": 2,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "source": [
                "# **Part 3: Get environment information**"
            ],
            "metadata": {
                "id": "z2Ik5cDoyPGU"
            }
        },
        {
            "cell_type": "code",
            "source": [
                "get_gym_env_args(gym.make(\"LunarLanderContinuous-v2\"), if_print=False)"
            ],
            "metadata": {
                "colab": {
                    "base_uri": "https://localhost:8080/"
                },
                "id": "wwkZXiHtyV6f",
                "outputId": "24720b42-25ca-4491-8fff-c6e7470dedf5"
            },
            "execution_count": 3,
            "outputs": [
                {
                    "output_type": "execute_result",
                    "data": {
                        "text/plain": [
                            "{'action_dim': 2,\n",
                            " 'env_name': 'LunarLanderContinuous-v2',\n",
                            " 'env_num': 1,\n",
                            " 'if_discrete': False,\n",
                            " 'max_step': 1000,\n",
                            " 'state_dim': 8,\n",
                            " 'target_return': 200}"
                        ]
                    },
                    "metadata": {},
                    "execution_count": 3
                }
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "3n8zcgcn14uq"
            },
            "source": [
                "# **Part 4: Specify Agent and Environment**\n",
                "\n",
                "*   **agent**: chooses a agent (DRL algorithm) from a set of agents in the [directory](https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/elegantrl/agents).\n",
                "*   **env_func**: the function to create an environment, in this case, we use gym.make to create BipedalWalker-v3.\n",
                "*   **env_args**: the environment information.\n"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "E03f6cTeajK4"
            },
            "source": [
                "env_func = gym.make\n",
                "env_args = {\n",
                "    \"env_num\": 1,\n",
                "    \"env_name\": \"LunarLanderContinuous-v2\",\n",
                "    \"max_step\": 1000,\n",
                "    \"state_dim\": 8,\n",
                "    \"action_dim\": 2,\n",
                "    \"if_discrete\": False,\n",
                "    \"target_return\": 200,\n",
                "    \"id\": \"LunarLanderContinuous-v2\",\n",
                "}\n",
                "args = Arguments(AgentModSAC, env_func=env_func, env_args=env_args)"
            ],
            "execution_count": 4,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "source": [
                "# **Part 4: Specify hyper-parameters**\n",
                "A list of hyper-parameters is available [here](https://elegantrl.readthedocs.io/en/latest/api/config.html)."
            ],
            "metadata": {
                "id": "rcFcUkwfzHLE"
            }
        },
        {
            "cell_type": "code",
            "source": [
                "args.target_step = args.max_step\n",
                "args.gamma = 0.99\n",
                "args.eval_times = 2**5\n",
                "args.random_seed = 2022"
            ],
            "metadata": {
                "id": "9WCAcmIfzGyE"
            },
            "execution_count": 7,
            "outputs": []
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "z1j5kLHF2dhJ"
            },
            "source": [
                "# **Part 5: Train and Evaluate the Agent**\n",
                "\n",
                "\n",
                "\n",
                "\n"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {
                "id": "KGOPSD6da23k",
                "colab": {
                    "base_uri": "https://localhost:8080/"
                },
                "outputId": "84157a77-bcfa-406d-c25e-d3dea0e8fc20"
            },
            "source": [
                "train_and_evaluate(args)"
            ],
            "execution_count": 8,
            "outputs": [
                {
                    "output_type": "stream",
                    "name": "stdout",
                    "text": [
                        "| Arguments Remove cwd: ./LunarLanderContinuous-v2_ModSAC_0\n",
                        "################################################################################\n",
                        "ID     Step    maxR |    avgR   stdR   avgS  stdS |    expR   objC   etc.\n",
                        "0  1.02e+03 -124.42 |\n",
                        "0  1.02e+03 -124.42 | -124.42   42.8     70    13 |   -1.82   0.89   0.13   0.15\n",
                        "0  5.96e+04   71.75 |\n",
                        "0  5.96e+04   71.75 |   71.75  126.5    553   217 |   -0.10   1.62   3.90   0.18\n",
                        "0  8.61e+04   71.75 |  -12.89  112.4    838   178 |    0.03   1.50  17.20   0.21\n",
                        "0  9.90e+04   71.75 |  -17.51  121.6    647   368 |   -0.25   1.46  20.98   0.24\n",
                        "0  1.11e+05   71.75 |  -46.97  103.4    589   381 |    0.10   1.25  25.45   0.27\n",
                        "0  1.24e+05   71.75 |   18.73  107.8    468   396 |    0.08   1.34  29.77   0.31\n",
                        "0  1.41e+05  126.34 |\n",
                        "0  1.41e+05  126.34 |  126.34   93.8    731   189 |    0.09   1.30  30.89   0.38\n",
                        "0  1.53e+05  126.34 |  105.38  115.7    760   220 |    0.14   1.39  32.72   0.45\n",
                        "0  1.64e+05  126.34 |  120.10   97.1    781   211 |    0.00   1.53  34.34   0.53\n",
                        "0  1.73e+05  126.98 |\n",
                        "0  1.73e+05  126.98 |  126.98  104.9    697   265 |    0.05   1.60  42.42   0.61\n",
                        "0  1.84e+05  126.98 |  124.46   87.9    814   218 |    0.13   1.79  42.35   0.68\n",
                        "0  1.93e+05  179.97 |\n",
                        "0  1.93e+05  179.97 |  179.97   66.7    593   238 |    0.10   1.96  48.47   0.72\n",
                        "0  2.06e+05  179.97 |  123.89   99.6    618   295 |    0.12   1.80  58.22   0.71\n",
                        "0  2.18e+05  179.97 |  140.68  103.4    594   267 |   -0.01   1.77  72.17   0.68\n",
                        "0  2.30e+05  179.97 |  176.68   66.3    600   211 |    0.08   2.07  70.53   0.64\n",
                        "0  2.41e+05  179.97 |  143.05  116.6    532   296 |    0.11   1.72  70.40   0.63\n",
                        "0  2.52e+05  179.97 |  159.72   93.9    648   237 |    0.13   1.75  65.22   0.62\n",
                        "0  2.61e+05  179.97 |  177.75   86.6    588   211 |   -0.02   1.69  67.94   0.61\n",
                        "0  2.70e+05  179.97 |  175.77   61.6    679   137 |    0.12   1.77  67.64   0.60\n",
                        "0  2.78e+05  179.97 |  174.47  109.0    596   209 |    0.11   1.52  71.18   0.60\n",
                        "0  2.88e+05  182.06 |\n",
                        "0  2.88e+05  182.06 |  182.06   90.0    594   218 |    0.10   1.64  67.75   0.58\n",
                        "0  2.96e+05  182.06 |  107.21   84.6    852   162 |    0.02   1.30  67.73   0.57\n",
                        "0  3.00e+05  182.06 |  132.97   80.9    821   136 |    0.10   1.50  67.93   0.57\n",
                        "0  3.04e+05  182.06 |  179.96   58.0    692   176 |    0.06   1.48  66.47   0.56\n",
                        "0  3.11e+05  182.06 |  126.66   68.9    852   138 |    0.08   1.67  64.73   0.56\n",
                        "0  3.15e+05  182.06 |   96.49  100.4    844   177 |    0.00   1.28  61.74   0.55\n",
                        "0  3.19e+05  182.06 |   51.84   87.5    918   128 |    0.08   1.35  59.62   0.55\n",
                        "0  3.23e+05  182.06 |   89.50   83.5    890   125 |    0.10   1.39  59.16   0.54\n",
                        "0  3.27e+05  182.06 |   84.80   87.7    921    90 |    0.03   1.55  62.51   0.53\n",
                        "0  3.30e+05  182.06 |   71.72   78.7    935   101 |    0.07   1.42  66.60   0.52\n",
                        "0  3.34e+05  182.06 |   99.71  109.3    811   195 |    0.07   1.45  61.71   0.52\n",
                        "0  3.39e+05  182.06 |  116.89   94.0    797   193 |    0.13   1.40  60.26   0.51\n",
                        "0  3.44e+05  182.06 |   78.11   98.1    871   169 |    0.15   1.23  60.31   0.50\n",
                        "0  3.48e+05  182.06 |  178.17   69.6    631   207 |   -0.00   1.15  58.81   0.49\n",
                        "0  3.55e+05  182.06 |  134.00  106.0    716   252 |    0.04   1.41  56.00   0.48\n",
                        "0  3.61e+05  197.49 |\n",
                        "0  3.61e+05  197.49 |  197.49   47.8    581   167 |    0.11   1.32  54.38   0.46\n",
                        "0  3.68e+05  197.49 |  172.83   88.3    645   254 |    0.14   1.30  51.69   0.45\n",
                        "0  3.75e+05  234.16 |\n",
                        "0  3.75e+05  234.16 |  234.16   30.4    417   168 |    0.12   1.18  55.01   0.44\n",
                        "| UsedTime: 4920 | SavedDir: ./LunarLanderContinuous-v2_ModSAC_0\n",
                        "| ReplayBuffer save in: ./LunarLanderContinuous-v2_ModSAC_0/replay_0.npz\n"
                    ]
                }
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "JPXOxLSqh5cP"
            },
            "source": [
                "Understanding the above results::\n",
                "*   **Step**: the total training steps.\n",
                "*  **MaxR**: the maximum reward.\n",
                "*   **avgR**: the average of the rewards.\n",
                "*   **stdR**: the standard deviation of the rewards.\n",
                "*   **objA**: the objective function value of Actor Network (Policy Network).\n",
                "*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network)."
            ]
        }
    ]
}